Introduction¶
As you learned in the previous lessons, YOLO is a state-of-the-art, real-time object detection algorithm. In this notebook, we will apply the YOLO algorithm to detect objects in images. We have provided a series of images that you can test the YOLO algorithm on.
Importing Resources¶
We will start by loading the required packages into Python. We will be using OpenCV to load our images, matplotlib to plot them, autils module that contains some helper functions, and a modified version of Darknet. YOLO uses Darknet, an open source, deep neural network framework written by the creators of YOLO. The version of Darknet used in this notebook has been modified to work in PyTorch 0.4 and has been simplified because we won't be doing any training. Instead, we will be using a set of pre-trained weights that were trained on the Common Objects in Context (COCO) database. For more information on Darknet, please visit Darknet.
pip install opencv-python
Collecting opencv-pythonNote: you may need to restart the kernel to use updated packages.
Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl.metadata (20 kB)
Requirement already satisfied: numpy>=1.21.2 in c:\users\hieub\miniconda3\lib\site-packages (from opencv-python) (1.26.4)
Downloading opencv_python-4.10.0.84-cp37-abi3-win_amd64.whl (38.8 MB)
---------------------------------------- 0.0/38.8 MB ? eta -:--:--
---------------------------------------- 0.3/38.8 MB ? eta -:--:--
--------------------------------------- 0.5/38.8 MB 2.4 MB/s eta 0:00:16
- -------------------------------------- 1.0/38.8 MB 2.1 MB/s eta 0:00:19
- -------------------------------------- 1.6/38.8 MB 2.3 MB/s eta 0:00:16
-- ------------------------------------- 2.1/38.8 MB 2.4 MB/s eta 0:00:16
-- ------------------------------------- 2.9/38.8 MB 2.5 MB/s eta 0:00:15
--- ------------------------------------ 3.4/38.8 MB 2.5 MB/s eta 0:00:15
---- ----------------------------------- 4.2/38.8 MB 2.6 MB/s eta 0:00:14
---- ----------------------------------- 4.7/38.8 MB 2.6 MB/s eta 0:00:13
----- ---------------------------------- 5.2/38.8 MB 2.7 MB/s eta 0:00:13
----- ---------------------------------- 5.8/38.8 MB 2.6 MB/s eta 0:00:13
------ --------------------------------- 6.3/38.8 MB 2.6 MB/s eta 0:00:13
------- -------------------------------- 6.8/38.8 MB 2.6 MB/s eta 0:00:13
------- -------------------------------- 7.3/38.8 MB 2.6 MB/s eta 0:00:13
-------- ------------------------------- 8.1/38.8 MB 2.7 MB/s eta 0:00:12
-------- ------------------------------- 8.7/38.8 MB 2.7 MB/s eta 0:00:12
--------- ------------------------------ 9.4/38.8 MB 2.7 MB/s eta 0:00:11
---------- ----------------------------- 10.0/38.8 MB 2.7 MB/s eta 0:00:11
----------- ---------------------------- 10.7/38.8 MB 2.7 MB/s eta 0:00:11
----------- ---------------------------- 11.3/38.8 MB 2.7 MB/s eta 0:00:11
------------ --------------------------- 11.8/38.8 MB 2.7 MB/s eta 0:00:10
------------ --------------------------- 12.6/38.8 MB 2.8 MB/s eta 0:00:10
------------- -------------------------- 13.4/38.8 MB 2.8 MB/s eta 0:00:10
-------------- ------------------------- 14.2/38.8 MB 2.8 MB/s eta 0:00:09
--------------- ------------------------ 14.7/38.8 MB 2.8 MB/s eta 0:00:09
--------------- ------------------------ 15.5/38.8 MB 2.9 MB/s eta 0:00:09
---------------- ----------------------- 16.0/38.8 MB 2.9 MB/s eta 0:00:09
----------------- ---------------------- 16.5/38.8 MB 2.8 MB/s eta 0:00:08
----------------- ---------------------- 16.8/38.8 MB 2.8 MB/s eta 0:00:08
----------------- ---------------------- 17.0/38.8 MB 2.7 MB/s eta 0:00:09
----------------- ---------------------- 17.3/38.8 MB 2.7 MB/s eta 0:00:09
------------------ --------------------- 17.8/38.8 MB 2.6 MB/s eta 0:00:08
------------------ --------------------- 18.1/38.8 MB 2.6 MB/s eta 0:00:08
------------------- -------------------- 18.6/38.8 MB 2.6 MB/s eta 0:00:08
------------------- -------------------- 19.4/38.8 MB 2.6 MB/s eta 0:00:08
-------------------- ------------------- 20.2/38.8 MB 2.7 MB/s eta 0:00:07
--------------------- ------------------ 20.7/38.8 MB 2.7 MB/s eta 0:00:07
---------------------- ----------------- 21.5/38.8 MB 2.7 MB/s eta 0:00:07
---------------------- ----------------- 22.3/38.8 MB 2.7 MB/s eta 0:00:07
----------------------- ---------------- 22.8/38.8 MB 2.7 MB/s eta 0:00:06
------------------------ --------------- 23.6/38.8 MB 2.7 MB/s eta 0:00:06
------------------------ --------------- 24.1/38.8 MB 2.7 MB/s eta 0:00:06
------------------------- -------------- 24.9/38.8 MB 2.7 MB/s eta 0:00:06
-------------------------- ------------- 25.4/38.8 MB 2.7 MB/s eta 0:00:05
-------------------------- ------------- 26.2/38.8 MB 2.7 MB/s eta 0:00:05
--------------------------- ------------ 27.0/38.8 MB 2.8 MB/s eta 0:00:05
---------------------------- ----------- 27.8/38.8 MB 2.8 MB/s eta 0:00:04
----------------------------- ---------- 28.3/38.8 MB 2.8 MB/s eta 0:00:04
------------------------------ --------- 29.4/38.8 MB 2.8 MB/s eta 0:00:04
------------------------------ --------- 29.9/38.8 MB 2.8 MB/s eta 0:00:04
------------------------------- -------- 30.7/38.8 MB 2.8 MB/s eta 0:00:03
-------------------------------- ------- 31.5/38.8 MB 2.9 MB/s eta 0:00:03
--------------------------------- ------ 32.2/38.8 MB 2.9 MB/s eta 0:00:03
---------------------------------- ----- 33.0/38.8 MB 2.9 MB/s eta 0:00:03
---------------------------------- ----- 33.8/38.8 MB 2.9 MB/s eta 0:00:02
----------------------------------- ---- 34.6/38.8 MB 2.9 MB/s eta 0:00:02
------------------------------------ --- 35.1/38.8 MB 2.9 MB/s eta 0:00:02
------------------------------------- -- 36.2/38.8 MB 2.9 MB/s eta 0:00:01
-------------------------------------- - 37.0/38.8 MB 3.0 MB/s eta 0:00:01
-------------------------------------- - 37.7/38.8 MB 3.0 MB/s eta 0:00:01
--------------------------------------- 38.5/38.8 MB 3.0 MB/s eta 0:00:01
--------------------------------------- 38.8/38.8 MB 3.0 MB/s eta 0:00:01
---------------------------------------- 38.8/38.8 MB 2.9 MB/s eta 0:00:00
Installing collected packages: opencv-python
Successfully installed opencv-python-4.10.0.84
import cv2
import matplotlib.pyplot as plt
pip install torch
Requirement already satisfied: torch in c:\users\hieub\miniconda3\lib\site-packages (2.5.1) Requirement already satisfied: filelock in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.16.1) Requirement already satisfied: typing-extensions>=4.8.0 in c:\users\hieub\appdata\roaming\python\python312\site-packages (from torch) (4.12.2) Requirement already satisfied: networkx in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.4.2) Requirement already satisfied: jinja2 in c:\users\hieub\miniconda3\lib\site-packages (from torch) (3.1.4) Requirement already satisfied: fsspec in c:\users\hieub\miniconda3\lib\site-packages (from torch) (2024.10.0) Requirement already satisfied: setuptools in c:\users\hieub\miniconda3\lib\site-packages (from torch) (75.1.0) Requirement already satisfied: sympy==1.13.1 in c:\users\hieub\miniconda3\lib\site-packages (from torch) (1.13.1) Requirement already satisfied: mpmath<1.4,>=1.1.0 in c:\users\hieub\miniconda3\lib\site-packages (from sympy==1.13.1->torch) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in c:\users\hieub\appdata\roaming\python\python312\site-packages (from jinja2->torch) (3.0.2) Note: you may need to restart the kernel to use updated packages.
pwd()
'D:\\1. Lectures\\5-PRP201c_Python programming\\2_Ipynb'
pip install patool
Collecting patool Using cached patool-3.1.0-py2.py3-none-any.whl.metadata (4.3 kB) Using cached patool-3.1.0-py2.py3-none-any.whl (98 kB) Installing collected packages: patool Successfully installed patool-3.1.0 Note: you may need to restart the kernel to use updated packages.
import patoolib
in_dir= 'Datasets/Yolo.zip'
out_dir= 'Datasets'
patoolib.extract_archive(in_dir,outdir=out_dir)
INFO patool: Extracting Datasets/Yolo.zip ... INFO patool: could not find a 'file' executable, falling back to guess mime type by file extension INFO patool: ... Datasets/Yolo.zip extracted to `Datasets'.
'Datasets'
pwd()
'D:\\1. Lectures\\5-PRP201c_Python programming\\2_Ipynb'
%run Datasets/Yolo/utils.py
%run Datasets/Yolo/darknet.py
# Set the location and name of the cfg file
cfg_file = 'Datasets/Yolo/cfg/yolov3.cfg'
# Set the location and name of the pre-trained weights file
weight_file = 'Datasets/Yolo/weights/yolov3.weights'
# Set the location and name of the COCO object classes file
namesfile = 'Datasets/Yolo/data/coco.names'
# Load the network architecture
m = Darknet(cfg_file)
# Load the pre-trained weights
m.load_weights(weight_file)
# Load the COCO object classes
class_names = load_class_names(namesfile)
Loading weights. Please Wait...100.00% Complete
print(class_names)
print(len(class_names))
['person', 'bicycle', 'car', 'motorbike', 'aeroplane', 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'sofa', 'pottedplant', 'bed', 'diningtable', 'toilet', 'tvmonitor', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'] 80
Taking a Look at The Neural Network¶
# Print the neural network used in YOLOv3
m.print_network()
layer filters size input output
0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32
1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64
2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32
3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64
4 shortcut 1
5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128
6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
8 shortcut 5
9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64
10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128
11 shortcut 8
12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256
13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
15 shortcut 12
16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
18 shortcut 15
19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
21 shortcut 18
22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
24 shortcut 21
25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
27 shortcut 24
28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
30 shortcut 27
31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
33 shortcut 30
34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
36 shortcut 33
37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512
38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
40 shortcut 37
41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
43 shortcut 40
44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
46 shortcut 43
47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
49 shortcut 46
50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
52 shortcut 49
53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
55 shortcut 52
56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
58 shortcut 55
59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
61 shortcut 58
62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024
63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
65 shortcut 62
66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
68 shortcut 65
69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
71 shortcut 68
72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
74 shortcut 71
75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512
80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024
81 conv 255 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 255
82 detection
83 route 79
84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256
85 upsample * 2 13 x 13 x 256 -> 26 x 26 x 256
86 route 85 61
87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256
88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256
92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512
93 conv 255 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 255
94 detection
95 route 91
96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128
97 upsample * 2 26 x 26 x 128 -> 52 x 52 x 128
98 route 97 36
99 conv 128 1 x 1 / 1 52 x 52 x 384 -> 52 x 52 x 128
100 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
101 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
102 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
103 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128
104 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256
105 conv 255 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 255
106 detection
As we can see, the neural network used by YOLOv3 consists mainly of convolutional layers, with some shortcut connections and upsample layers. For a full description of this network please refer to the YOLOv3 Paper.
Loading and Resizing Our Images¶
In the code below, we load our images using OpenCV's cv2.imread() function. Since, this function loads images as BGR we will convert our images to RGB so we can display them with the correct colors.
As we can see in the previous cell, the input size of the first layer of the network is 416 x 416 x 3. Since images have different sizes, we have to resize our images to be compatible with the input size of the first layer in the network. In the code below, we resize our images using OpenCV's cv2.resize() function. We then plot the original and resized images.
# Set the default figure size
plt.rcParams['figure.figsize'] = [24.0, 14.0]
# Load the image
img = cv2.imread('Datasets/Yolo/images/city_scene.jpg')
# Convert the image to RGB
original_image = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# We resize the image to the input width and height of the first layer of the network.
print(m.width, m.height) # m darknet
resized_image = cv2.resize(original_image, (m.width, m.height))
# Display the images
plt.subplot(121)
plt.title('Original Image')
plt.imshow(original_image)
plt.subplot(122)
plt.title('Resized Image')
plt.imshow(resized_image)
plt.show()
416 416
Setting the Non-Maximal Suppression Threshold¶
As you learned in the previous lessons, YOLO uses Non-Maximal Suppression (NMS) to only keep the best bounding box. The first step in NMS is to remove all the predicted bounding boxes that have a detection probability that is less than a given NMS threshold. In the code below, we set this NMS threshold to 0.6. This means that all predicted bounding boxes that have a detection probability less than 0.6 will be removed.
# Set the NMS threshold
nms_thresh = 0.3
Setting the Intersection Over Union Threshold¶
After removing all the predicted bounding boxes that have a low detection probability, the second step in NMS, is to select the bounding boxes with the highest detection probability and eliminate all the bounding boxes whose Intersection Over Union (IOU) value is higher than a given IOU threshold. In the code below, we set this IOU threshold to 0.4. This means that all predicted bounding boxes that have an IOU value greater than 0.4 with respect to the best bounding boxes will be removed.
In the utils module you will find the nms function, that performs the second step of Non-Maximal Suppression, and the boxes_iou function that calculates the Intersection over Union of two given bounding boxes. You are encouraged to look at these functions to see how they work.
# Set the IOU threshold
iou_thresh = 0.4
nms_thresh = 0.3
Object Detection¶
Once the image has been loaded and resized, and you have chosen your parameters for nms_thresh and iou_thresh, we can use the YOLO algorithm to detect objects in the image. We detect the objects using the detect_objects(m, resized_image, iou_thresh, nms_thresh)function from the utils module. This function takes in the model m returned by Darknet, the resized image, and the NMS and IOU thresholds, and returns the bounding boxes of the objects found.
Each bounding box contains 7 parameters: the coordinates (x, y) of the center of the bounding box, the width w and height h of the bounding box, the confidence detection level, the object class probability, and the object class id. The detect_objects() function also prints out the time it took for the YOLO algorithm to detect the objects in the image and the number of objects detected. Since we are running the algorithm on a CPU it takes about 2 seconds to detect the objects in an image, however, if we were to use a GPU it would run much faster.
Once we have the bounding boxes of the objects found by YOLO, we can print the class of the objects found and their corresponding object class probability. To do this we use the print_objects() function in the utils module.
Finally, we use the plot_boxes() function to plot the bounding boxes and corresponding object class labels found by YOLO in our image. If you set the plot_labels flag to False you will display the bounding boxes with no labels. This makes it easier to view the bounding boxes if your nms_thresh is too low. The plot_boxes()function uses the same color to plot the bounding boxes of the same object class. However, if you want all bounding boxes to be the same color, you can use the color keyword to set the desired color. For example, if you want all the bounding boxes to be red you can use:
plot_boxes(original_image, boxes, class_names, plot_labels = True, color = (1,0,0))
You are encouraged to change the iou_thresh and nms_thresh parameters to see how they affect the YOLO detection algorithm. The default values of iou_thresh = 0.4 and nms_thresh = 0.6 work well to detect objects in different kinds of images. In the cell below, we have repeated some of the code used before in order to prevent you from scrolling up down when you want to change the iou_thresh and nms_threshparameters or the image. Have Fun!
iou_thresh = 0.4
nms_thresh = 0.3
# Detect objects in the image
boxes = detect_objects(m, resized_image, iou_thresh, nms_thresh)
# Print the objects found and the confidence level
print_objects(boxes, class_names)
#Plot the image with bounding boxes and corresponding object class labels
plot_boxes(original_image, boxes, class_names, plot_labels = True)
It took 2.348 seconds to detect the objects in the image. Number of Objects Detected: 39 Objects Found and Confidence Level: 1. person: 0.999996 2. person: 1.000000 3. car: 0.707238 4. truck: 0.933031 5. car: 0.658085 6. truck: 0.666981 7. person: 1.000000 8. traffic light: 1.000000 9. person: 1.000000 10. car: 0.997369 11. bus: 0.998023 12. person: 1.000000 13. person: 1.000000 14. person: 1.000000 15. person: 1.000000 16. person: 1.000000 17. traffic light: 1.000000 18. traffic light: 1.000000 19. handbag: 0.997282 20. traffic light: 1.000000 21. car: 0.989741 22. traffic light: 1.000000 23. traffic light: 0.999999 24. person: 0.999999 25. truck: 0.715037 26. traffic light: 1.000000 27. person: 0.999993 28. person: 0.999996 29. person: 0.999913 30. person: 0.999995 31. person: 0.999851 32. traffic light: 0.999520 33. person: 0.999997 34. traffic light: 1.000000 35. person: 0.756844 36. person: 0.967352 37. motorbike: 0.536371 38. traffic light: 0.999992 39. person: 0.999998
img = original_image.copy()
width = img.shape[1]
height = img.shape[0]
DetectedList = []
for i in range(len(boxes)):
box = boxes[i]
# Get the (x,y) pixel coordinates of the lower-left and lower-right corners
# of the bounding box relative to the size of the image.
x1 = int(np.around((box[0] - box[2]/2.0) * width))
y1 = int(np.around((box[1] - box[3]/2.0) * height))
x2 = int(np.around((box[0] + box[2]/2.0) * width))
y2 = int(np.around((box[1] + box[3]/2.0) * height))
if len(box) >= 7 and class_names:
cls_conf = box[5]
cls_id = box[6]
print('%i. %s: %f' % (i + 1, class_names[cls_id], float(cls_conf)))
print(f"left : {x1} ; top : {y1} ; right : {x2} ; bottom : {y2}")
d = {}
d["objectname"] = class_names[cls_id]
d["confident"] = float(cls_conf)
d["pos"] = [x1, y1, x2, y2]
DetectedList.append(d)
1. person: 0.999996 left : 4204 ; top : 2566 ; right : 4506 ; bottom : 3227 2. person: 1.000000 left : 328 ; top : 2618 ; right : 676 ; bottom : 3211 3. car: 0.707238 left : 2652 ; top : 2529 ; right : 2911 ; bottom : 2752 4. truck: 0.933031 left : 3650 ; top : 2515 ; right : 4086 ; bottom : 2891 5. car: 0.658085 left : 2283 ; top : 2533 ; right : 2577 ; bottom : 2828 6. truck: 0.666981 left : 1644 ; top : 2504 ; right : 2074 ; bottom : 2899 7. person: 1.000000 left : 3181 ; top : 2592 ; right : 3396 ; bottom : 3051 8. traffic light: 1.000000 left : 235 ; top : 1837 ; right : 374 ; bottom : 2206 9. person: 1.000000 left : 1148 ; top : 2559 ; right : 1276 ; bottom : 2853 10. car: 0.997369 left : 2036 ; top : 2582 ; right : 2311 ; bottom : 2844 11. bus: 0.998023 left : 1549 ; top : 2413 ; right : 2254 ; bottom : 2759 12. person: 1.000000 left : 886 ; top : 2561 ; right : 990 ; bottom : 2848 13. person: 1.000000 left : 1300 ; top : 2565 ; right : 1405 ; bottom : 2863 14. person: 1.000000 left : 1347 ; top : 2564 ; right : 1453 ; bottom : 2862 15. person: 1.000000 left : 768 ; top : 2557 ; right : 865 ; bottom : 2846 16. person: 1.000000 left : 4611 ; top : 2562 ; right : 4740 ; bottom : 2895 17. traffic light: 1.000000 left : 1593 ; top : 2072 ; right : 1666 ; bottom : 2251 18. traffic light: 1.000000 left : 3566 ; top : 2184 ; right : 3630 ; bottom : 2287 19. handbag: 0.997282 left : 564 ; top : 2781 ; right : 671 ; bottom : 2957 20. traffic light: 1.000000 left : 2485 ; top : 2321 ; right : 2519 ; bottom : 2378 21. car: 0.989741 left : 3469 ; top : 2599 ; right : 3643 ; bottom : 2837 22. traffic light: 1.000000 left : 2843 ; top : 2393 ; right : 2869 ; bottom : 2443 23. traffic light: 0.999999 left : 3822 ; top : 2195 ; right : 3882 ; bottom : 2283 24. person: 0.999999 left : 4477 ; top : 2560 ; right : 4588 ; bottom : 2872 25. truck: 0.715037 left : 2881 ; top : 2510 ; right : 3063 ; bottom : 2703 26. traffic light: 1.000000 left : 2468 ; top : 2226 ; right : 2508 ; bottom : 2321 27. person: 0.999993 left : 1061 ; top : 2561 ; right : 1148 ; bottom : 2840 28. person: 0.999996 left : 4134 ; top : 2580 ; right : 4356 ; bottom : 3135 29. person: 0.999913 left : 1000 ; top : 2566 ; right : 1088 ; bottom : 2837 30. person: 0.999995 left : 689 ; top : 2559 ; right : 790 ; bottom : 2882 31. person: 0.999851 left : 675 ; top : 2475 ; right : 4470 ; bottom : 2887 32. traffic light: 0.999520 left : 2792 ; top : 2438 ; right : 2820 ; bottom : 2471 33. person: 0.999997 left : 598 ; top : 2548 ; right : 716 ; bottom : 2896 34. traffic light: 1.000000 left : 4118 ; top : 1379 ; right : 4232 ; bottom : 1645 35. person: 0.756844 left : 3047 ; top : 2599 ; right : 3119 ; bottom : 2732 36. person: 0.967352 left : 294 ; top : 2630 ; right : 431 ; bottom : 3050 37. motorbike: 0.536371 left : 2571 ; top : 2582 ; right : 2631 ; bottom : 2735 38. traffic light: 0.999992 left : 1026 ; top : 2449 ; right : 1090 ; bottom : 2508 39. person: 0.999998 left : 4998 ; top : 2559 ; right : 5078 ; bottom : 2861
DetectedList
[{'objectname': 'person',
'confident': 0.9999955892562866,
'pos': [4204, 2566, 4506, 3227]},
{'objectname': 'person',
'confident': 0.9999998807907104,
'pos': [328, 2618, 676, 3211]},
{'objectname': 'car',
'confident': 0.7072378396987915,
'pos': [2652, 2529, 2911, 2752]},
{'objectname': 'truck',
'confident': 0.9330310821533203,
'pos': [3650, 2515, 4086, 2891]},
{'objectname': 'car',
'confident': 0.6580854058265686,
'pos': [2283, 2533, 2577, 2828]},
{'objectname': 'truck',
'confident': 0.6669811010360718,
'pos': [1644, 2504, 2074, 2899]},
{'objectname': 'person', 'confident': 1.0, 'pos': [3181, 2592, 3396, 3051]},
{'objectname': 'traffic light',
'confident': 1.0,
'pos': [235, 1837, 374, 2206]},
{'objectname': 'person',
'confident': 0.9999998807907104,
'pos': [1148, 2559, 1276, 2853]},
{'objectname': 'car',
'confident': 0.9973688125610352,
'pos': [2036, 2582, 2311, 2844]},
{'objectname': 'bus',
'confident': 0.9980230331420898,
'pos': [1549, 2413, 2254, 2759]},
{'objectname': 'person',
'confident': 0.9999998807907104,
'pos': [886, 2561, 990, 2848]},
{'objectname': 'person', 'confident': 1.0, 'pos': [1300, 2565, 1405, 2863]},
{'objectname': 'person',
'confident': 0.9999995231628418,
'pos': [1347, 2564, 1453, 2862]},
{'objectname': 'person',
'confident': 0.9999998807907104,
'pos': [768, 2557, 865, 2846]},
{'objectname': 'person',
'confident': 0.9999997615814209,
'pos': [4611, 2562, 4740, 2895]},
{'objectname': 'traffic light',
'confident': 1.0,
'pos': [1593, 2072, 1666, 2251]},
{'objectname': 'traffic light',
'confident': 1.0,
'pos': [3566, 2184, 3630, 2287]},
{'objectname': 'handbag',
'confident': 0.9972816705703735,
'pos': [564, 2781, 671, 2957]},
{'objectname': 'traffic light',
'confident': 1.0,
'pos': [2485, 2321, 2519, 2378]},
{'objectname': 'car',
'confident': 0.9897407293319702,
'pos': [3469, 2599, 3643, 2837]},
{'objectname': 'traffic light',
'confident': 0.9999995231628418,
'pos': [2843, 2393, 2869, 2443]},
{'objectname': 'traffic light',
'confident': 0.9999986886978149,
'pos': [3822, 2195, 3882, 2283]},
{'objectname': 'person',
'confident': 0.9999992847442627,
'pos': [4477, 2560, 4588, 2872]},
{'objectname': 'truck',
'confident': 0.7150365114212036,
'pos': [2881, 2510, 3063, 2703]},
{'objectname': 'traffic light',
'confident': 1.0,
'pos': [2468, 2226, 2508, 2321]},
{'objectname': 'person',
'confident': 0.999993085861206,
'pos': [1061, 2561, 1148, 2840]},
{'objectname': 'person',
'confident': 0.9999961853027344,
'pos': [4134, 2580, 4356, 3135]},
{'objectname': 'person',
'confident': 0.9999125003814697,
'pos': [1000, 2566, 1088, 2837]},
{'objectname': 'person',
'confident': 0.9999953508377075,
'pos': [689, 2559, 790, 2882]},
{'objectname': 'person',
'confident': 0.9998505115509033,
'pos': [675, 2475, 4470, 2887]},
{'objectname': 'traffic light',
'confident': 0.999519944190979,
'pos': [2792, 2438, 2820, 2471]},
{'objectname': 'person',
'confident': 0.9999966621398926,
'pos': [598, 2548, 716, 2896]},
{'objectname': 'traffic light',
'confident': 0.9999996423721313,
'pos': [4118, 1379, 4232, 1645]},
{'objectname': 'person',
'confident': 0.756843626499176,
'pos': [3047, 2599, 3119, 2732]},
{'objectname': 'person',
'confident': 0.967352032661438,
'pos': [294, 2630, 431, 3050]},
{'objectname': 'motorbike',
'confident': 0.5363708734512329,
'pos': [2571, 2582, 2631, 2735]},
{'objectname': 'traffic light',
'confident': 0.999991774559021,
'pos': [1026, 2449, 1090, 2508]},
{'objectname': 'person',
'confident': 0.9999980926513672,
'pos': [4998, 2559, 5078, 2861]}]
import json
file_name = "ObjectDetection.json"
with open(file_name, "w") as fid:
json.dump(DetectedList, fid)
with open(file_name, "r") as read_file:
data = json.load(read_file)
print(data)
[{'objectname': 'person', 'confident': 0.9999955892562866, 'pos': [4204, 2566, 4506, 3227]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [328, 2618, 676, 3211]}, {'objectname': 'car', 'confident': 0.7072378396987915, 'pos': [2652, 2529, 2911, 2752]}, {'objectname': 'truck', 'confident': 0.9330310821533203, 'pos': [3650, 2515, 4086, 2891]}, {'objectname': 'car', 'confident': 0.6580854058265686, 'pos': [2283, 2533, 2577, 2828]}, {'objectname': 'truck', 'confident': 0.6669811010360718, 'pos': [1644, 2504, 2074, 2899]}, {'objectname': 'person', 'confident': 1.0, 'pos': [3181, 2592, 3396, 3051]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [235, 1837, 374, 2206]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [1148, 2559, 1276, 2853]}, {'objectname': 'car', 'confident': 0.9973688125610352, 'pos': [2036, 2582, 2311, 2844]}, {'objectname': 'bus', 'confident': 0.9980230331420898, 'pos': [1549, 2413, 2254, 2759]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [886, 2561, 990, 2848]}, {'objectname': 'person', 'confident': 1.0, 'pos': [1300, 2565, 1405, 2863]}, {'objectname': 'person', 'confident': 0.9999995231628418, 'pos': [1347, 2564, 1453, 2862]}, {'objectname': 'person', 'confident': 0.9999998807907104, 'pos': [768, 2557, 865, 2846]}, {'objectname': 'person', 'confident': 0.9999997615814209, 'pos': [4611, 2562, 4740, 2895]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [1593, 2072, 1666, 2251]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [3566, 2184, 3630, 2287]}, {'objectname': 'handbag', 'confident': 0.9972816705703735, 'pos': [564, 2781, 671, 2957]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [2485, 2321, 2519, 2378]}, {'objectname': 'car', 'confident': 0.9897407293319702, 'pos': [3469, 2599, 3643, 2837]}, {'objectname': 'traffic light', 'confident': 0.9999995231628418, 'pos': [2843, 2393, 2869, 2443]}, {'objectname': 'traffic light', 'confident': 0.9999986886978149, 'pos': [3822, 2195, 3882, 2283]}, {'objectname': 'person', 'confident': 0.9999992847442627, 'pos': [4477, 2560, 4588, 2872]}, {'objectname': 'truck', 'confident': 0.7150365114212036, 'pos': [2881, 2510, 3063, 2703]}, {'objectname': 'traffic light', 'confident': 1.0, 'pos': [2468, 2226, 2508, 2321]}, {'objectname': 'person', 'confident': 0.999993085861206, 'pos': [1061, 2561, 1148, 2840]}, {'objectname': 'person', 'confident': 0.9999961853027344, 'pos': [4134, 2580, 4356, 3135]}, {'objectname': 'person', 'confident': 0.9999125003814697, 'pos': [1000, 2566, 1088, 2837]}, {'objectname': 'person', 'confident': 0.9999953508377075, 'pos': [689, 2559, 790, 2882]}, {'objectname': 'person', 'confident': 0.9998505115509033, 'pos': [675, 2475, 4470, 2887]}, {'objectname': 'traffic light', 'confident': 0.999519944190979, 'pos': [2792, 2438, 2820, 2471]}, {'objectname': 'person', 'confident': 0.9999966621398926, 'pos': [598, 2548, 716, 2896]}, {'objectname': 'traffic light', 'confident': 0.9999996423721313, 'pos': [4118, 1379, 4232, 1645]}, {'objectname': 'person', 'confident': 0.756843626499176, 'pos': [3047, 2599, 3119, 2732]}, {'objectname': 'person', 'confident': 0.967352032661438, 'pos': [294, 2630, 431, 3050]}, {'objectname': 'motorbike', 'confident': 0.5363708734512329, 'pos': [2571, 2582, 2631, 2735]}, {'objectname': 'traffic light', 'confident': 0.999991774559021, 'pos': [1026, 2449, 1090, 2508]}, {'objectname': 'person', 'confident': 0.9999980926513672, 'pos': [4998, 2559, 5078, 2861]}]
display_image = original_image.copy()
# Create a figure and plot the image
fig, a = plt.subplots(1,1)
a.imshow(display_image)
for obj in data:
if(obj["objectname"] in ["car", "truck"]):
print(obj)
# Calculate the width and height of the bounding box relative to the size of the image.
x1, y1, x2, y2 = obj["pos"]
width_x = x2 - x1
width_y = y1 - y2
# Set the postion and size of the bounding box. (x1, y2) is the pixel coordinate of the
# lower-left corner of the bounding box relative to the size of the image.
rect = patches.Rectangle((x1, y2),
width_x, width_y,
linewidth = 2,
edgecolor = 'r',
facecolor = 'none')
# Draw the bounding box on top of the image
a.add_patch(rect)
# Create a string with the object class name and the corresponding object class probability
conf_tx = obj["objectname"] + ': {:.1f}'.format(obj["confident"])
# Define x and y offsets for the labels
lxc = (img.shape[1] * 0.266) / 100
lyc = (img.shape[0] * 1.180) / 100
# Draw the labels on top of the image
a.text(x1 + lxc, y1 - lyc, conf_tx, fontsize = 24, color = 'k',
bbox = dict(facecolor = 'b', edgecolor = 'b', alpha = 0.8))
plt.show()
{'objectname': 'car', 'confident': 0.7072378396987915, 'pos': [2652, 2529, 2911, 2752]}
{'objectname': 'truck', 'confident': 0.9330310821533203, 'pos': [3650, 2515, 4086, 2891]}
{'objectname': 'car', 'confident': 0.6580854058265686, 'pos': [2283, 2533, 2577, 2828]}
{'objectname': 'truck', 'confident': 0.6669811010360718, 'pos': [1644, 2504, 2074, 2899]}
{'objectname': 'car', 'confident': 0.9973688125610352, 'pos': [2036, 2582, 2311, 2844]}
{'objectname': 'car', 'confident': 0.9897407293319702, 'pos': [3469, 2599, 3643, 2837]}
{'objectname': 'truck', 'confident': 0.7150365114212036, 'pos': [2881, 2510, 3063, 2703]}
Exercise 2 (optional): Replace the image used to identify the object with a different image of your choice.¶